Index-Based Approximate XML Joins
نویسندگان
چکیده
XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in the data sets. In this paper we study the problem of integrating XML data sources through index assisted join operations, using notions of approximate match in the structure and content of XML documents as the join predicate. We show how a well known and widely deployed index structure, namely the R-tree, can be adopted to improve the performance of such operations. We propose novel search and join algorithms for R-trees adopted to index XML document collections. We also propose novel optimization objectives for R-tree construction, making R-trees better suited for this application.
منابع مشابه
Embedding Similarity Joins into Native XML Databases
Similarity joins in databases can be used for several important tasks such as data cleaning and instance-based data integration. In this paper, we explore ways how to support such tasks in a native XML database environment. The main goals of our work are: a) to prove the feasibility of performing tree similarity joins in a general-purpose XML database management system; b) to support stringand ...
متن کاملIndex XML Data Using Extended Order and Path Index
The eXtensible Markup Language (XML) is becoming a new standard for information representation and exchange over the Internet. How to index XML data for efficient query processing and XML transformation is an important subject in the XML community. In this paper, based on extended preorder indexing method, we add path information as part of the index. It is shown that the number of path joins c...
متن کاملHolistic Twig Joins on Indexed XML Documents
Finding all the occurrences of a twig pattern specified by a selection predicate on multiple elements in an XML document is a core operation for efficient evaluation of XML queries. Holistic twig join algorithms were proposed recently as an optimal solution when the twig pattern only involves ancestordescendant relationships. In this paper, we address the problem of efficient processing of holi...
متن کاملApproximate Geospatial Joins with Precision Guarantees
Geospatial joins are a core building block of connected mobility applications. An especially challenging problem are joins between streaming points and static polygons. Since points are not known beforehand, they cannot be indexed. Nevertheless, points need to be mapped to polygons with low latencies to enable real-time feedback. We present an approximate geospatial join that guarantees a user-...
متن کاملIndexing Schemes for Efficient Aggregate Computation over Structural Joins
With the increasing popularity of XML as a standard for data representation and exchange, efficient XML query processing has become a necessity. One popular approach encodes the hierarchical structure of XML data through a node numbering scheme, thus reducing typical queries to special forms (structural, path, twig) of containment joins. In this paper we consider how using an index can facilita...
متن کامل